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Conceptual Proofs of L log L Criteria 
for Mean Behavior of Branching Processes 

By Russell Lyons, Robin Pemantle, and Yuval Peres 
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■ Abstract. The Kesten-Stigum Theorem is a fundamental criterion for the rate of 
Oh, growth of a supercritical branching process, showing that an L log L condition is de- 

■ cisive. In critical and subcritical cases, results of Kolmogorov and later authors give 
ly-j , the rate of decay of the probability that the process survives at least n generations. 

We give conceptual proofs of these theorems based on comparisons of Galton- Watson 
measure to another measure on the space of trees. This approach also explains Ya- 
glom's exponential limit law for conditioned critical branching processes via a simple 
characterization of the exponential distribution. 
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51. Introduction. 



' Consider a Galton- Watson branching process with each particle having probability pk of 

en ! 

generating k children. Let L stand for a random variable with this offspring distribution. Let 
m := kpk be the mean number of children per particle and let Z n be the number of particles 
in the n th generation. The most basic and well-known fact about branching processes is that the 
extinction probability q : = limP[Z n = 0] is equal to 1 if and only if m < 1 and pi < 1. It is also 



not hard to establish that in the case m > 1, 
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- log Z n -> log m 

n 



almost surely on nonextinction, while in the case m < 1, 

-logP[Z n >0] ^logm. 

Finer questions may be asked: 

• In the case m > 1, when does the mean E[Z n ] = m n give the right growth rate up to a random 
factor? 

• In the case m < 1, when does the first moment estimate P[Z n > 0] < E[Z n ] = m n give the 
right decay rate up to a random factor? 

• In the case m = 1, what is the decay rate of ~P[Z n > 0]? 
These questions are answered by the following three classical theorems. 
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Theorem A: Supercritical Processes (Kesten and Stigum (1966)). 

Suppose that 1 < m < oo and let W be the limit of the martingale Z n /m n . The following are 
equivalent: 

(i) P[W = 0]=q; 
(ii) E[W] = 1; 
(in) E[Llog + L] < oo. 

Theorem B: Subcritical Processes (Heathcote, Seneta and Vere-Jones (1967)). 
The sequence {P[Z n > 0]/m n } is decreasing. If m < 1, then the following are equivalent: 

(i) lim^oo P[Z n >0]/m n > 0; 

(ii) supE[Z n I Z n > 0] < oo; 
(Hi) E[Llog + L] < oo. 

The fact that (i) holds if E[L 2 ] < oo was proved by Kolmogorov (1938). It is interesting that 
the law of Z n conditioned on Z n > always converges in a strong sense, even when its means are 
unbounded; see Section 6. 

Theorem C: Critical Processes (Kesten, Ney and Spitzer (1966)). 
Suppose that m = 1 and let a 2 := Var(L) = E[L 2 ] — 1 < oo. Then we have 

(i) Kolmogorov' s estimate: 



(ii) Yaglom's limit law: 

If a < oo, then the conditional distribution of Z n /n given Z n > converges as n — > oo to an 
exponential law with mean a 2 /2. If a = oo, then this conditional distribution converges to 
infinity. 

Under a third moment assumption, parts (i) and (ii) of Theorem C are due to Kolmogorov 
(1938) and Yaglom (1947), respectively. 

For classical proofs of these theorems, the reader is referred to Athreya and Ney (1972), pp. 15— 
33 and 38-45 or Asmussen and Hering (1983), pp. 23-25, 58-63, and 74-76. A very short proof of 
the Kesten-Stigum theorem, using martingale truncation, is in Tanny (1988). 

By using simple measure theory, we reduce the dichotomies between mean and sub-mean 
behavior in the first two theorems to easier known dichotomies concerning the growth of branching 
processes with immigration. These, in turn, arise from the following dichotomy, which is an 
immediate consequence of the Borel-Cantelli lemmas. 

LEMMA 1.1. Let X, Xi, X 2 , ■ ■ ■ be nonnegative i.i.d. random variables. Then 
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lim nP[Z n >0] = - T ; 
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Size-biased distributions, which arise in many contexts, play an important role in the present 
paper. Let X be a nonnegative random variable with finite positive mean. Say that X has the 
corresponding size-biased distribution if 



E\X] 

for every positive Borel function g. The analogous notion for random trees is the topic of Section 2. 

Note that if X is an exponential random variable and U is uniform in [0, 1] and independent of 
X, then the product U ■ X has the same distribution as X. (One way to see this is by considering 
the first and second points of a Poisson process.) The fact that this property actually characterizes 
the exponential distributions (Pakes and Khattree (1992)) is used in Section 4 to derive part (ii) 
of Theorem C. 

The next section is basic for the rest of the paper; Sections 3, 4 and 5, which contain the proofs 
of Theorems A, C and B, respectively, may be read independently of each other. An extension of 
Theorem A to branching processes in a random environment, due to Tanny (1988), is discussed in 
the final section. 



§2. Size-biased Trees. 

Our proofs depend on viewing Galton- Watson processes as generating random family trees, 
not merely as generating various numbers of particles; of course, this goes back at least to Harris 
(1963). We think of these trees as rooted and labeled, with the (distinguishable) offspring of each 
vertex ordered from left to right. We shall define another way of growing random trees, called 
size-biased Galton- Watson. The law of this random tree will be denoted GW, whereas the 
law of an ordinary Galton- Watson tree is denoted GW. 

Let L be a random variable whose distribution is that of size-biased L, i.e., P[L = k] = kpk/m. 
To construct a size-biased Galton- Watson tree T, start with an initial particle Vq. Give it a random 
number L\ of children, where L\ has the law of L. Pick one of these children at random, v\. 
Give the other children independently ordinary Galton- Watson descendant trees and give v\ an 
independent size-biased number L>2 of children. Again, pick one of the children of v\ at random, 
call it V2, and give the others ordinary Galton- Watson descendant trees. Continue in this way 
indefinitely. (See Figure !!!! .) Note that size-biased Galton- Watson trees are always infinite 
(there is no extinction). 

Define the measure GW* as the joint distribution of the random tree T and the random path 
(vo, Vi, V2, ■ ■ ■)■ Let GW be its marginal on the space of trees. 

For a tree t with Z n vertices at level n, write W n (t) := Z n /m n . For any rooted tree t and 
any n > 0, denote by \t\ n the set of rooted trees whose first n levels agree with those of t. (In 
particular, if the height of t is less than n, then [t] n = {t}.) If v is a vertex at the nth level of t, 
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then let [t; v] n denote the set of trees with distinguished paths such that the tree is in [t] n and 
the path starts from the root, does not backtrack, and goes through v. 

Assume that t is a tree of height at least n + 1 and that the root of t has k children with 
descendant trees t^ l \t^- 2 \ . . . , t^ k \ Any vertex v in level n + 1 of t is in one of these, say <W. The 
measure GW, clearly satisfies the recursion 

GW,M„+i = ^ • \ ■ GW4t ( ^;v] n ■ TTGW[^)]„ . 
m k - L - L 

By induction, we conclude that 

GW»[t;i = iGW[t] n (2.1) 

for all n and all [t; v] n as above. Therefore, 

GW[i]n = W n (t)GW[t] n , (2.2) 

for all n and all trees t. From (2.1) we see that, given the first n levels of the tree T, the measure 
GW* makes the vertex v n in the random path (v ,vi,. . .) uniformly distributed on the nth level 
off. 

The vertices off the "spine" (vo,Vi,...) of the size-biased tree form a branching process with 
immigration. In general, such a process is defined by two distributions, an offspring distribution 
and an immigration distribution. The process starts with no particles, say, and at every generation 
n > 1, there is an immigration of Y n particles, where Y n are i.i.d. with the given immigration law. 
Meanwhile, each particle has, independently, an ordinary Galton- Watson descendant tree with the 
given offspring distribution. 

Thus, the GW-law of Z n — 1 is the same as that of the generation sizes of an immigration 
process with Y n = L n — 1. The probabilistic content of the assumption E[Llog + L] < oo will arise 
in applying Lemma 1.1 to the variables {log + Y n }, since E[log + (L — 1)] = m _1 E[Llog + (L — 1)]. 

The construction of size-biased trees is not new. It and related constructions in other situations 
occur in Kahane and Peyriere (1976), Kallenberg (1977), Hawkes (1981), Rouault (1981), Joffe and 
Waugh (1982), Kesten (1986), Chauvin and Rouault (1988), Chauvin, Rouault and Wakolbinger 
(1991), and Waymire and Williams (1993). The paper of Waymire and Williams (1993) is the 
only one among these to use such a construction in a similar way to the method we use to prove 
Theorem A; their work was independent of and contemporaneous with ours. None of these papers 
use methods similar to the ones we employ for the proofs of Theorems B and C. An a priori 
motivation for the use of size-biased trees in our context comes from the general principle that 
in order to study asymptotics, it is useful to construct a suitable limiting object first. In the 
supercritical case, to study the asymptotic behavior of the martingale W n := Z n /m n with respect 
to GW, it is natural to consider the sequence of measures W n dGW, which converge weakly to 
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GW. As pointed out by the editor, this can also be viewed as a Doob ^-transform. When m < 1, 
the size-biased tree may be obtained by conditioning a Galton- Watson tree to survive forever. The 
generation sizes of size-biased Galton- Watson trees are known as a Q-process in the case m < 1; 
see Athreya-Ney (1972), pp. 56-60. One may also view GW* as a Campbell measure and GW as 
the associated Palm measure. 



^3. Supercritical Processes: Proof of Theorem A. 



Theorem A will be an immediate consequence of the following theorem on immigration pro- 
cesses. 

Theorem 3.1. (Seneta (1970)) Let Z n be the generation sizes of a Galton-Watson process with 
immigration Y n . Let m := E[L] > 1 be the mean of the offspring law and let Y have the same law 
as Y n . //E[log + y] < oo, then \\mZ n /m n exists and is finite a.s., while i/E[log + y] = co, then 
limsupZ ra /c n = co a.s. for every constant c > 0. 

Proof. (Asmussen and Hering (1983), pp. 50-51) Assume first that E[log + Y] = co. By Lemma 1.1, 
limsup Y n /c n = co a.s. Since Z n > Y n , the result follows. 

Now assume that E[log + Y] < co. Let y be the cr-field generated by {Yfc ; k > 1}. Let Z n ^ 
be the number of descendants at level n of the vertices which immigrated in generation k. Thus, 
the total number of vertices at level n is Y^k=i Zn,k- This gives 



E[Z n /m n \y\=E 



^ n In-. 

— Yz nk y =Y^b 



fe=i 



fe=i 



Z 



n , k 



rn 



n — k 



y 



Now, for k < n, the random variable Z n ^jm n ~ k is the (n — k)th element of the ordinary Galton- 
Watson martingale sequence starting with, however, Yf. particles. Therefore, its expectation is just 
Yfc and so 



E[Z n /m« | y] = ^ ■ 



k=l 

Our assumption gives, by Lemma 1.1, that Yjt grows subexponentially, whence this series converges 
a.s. Since {Z n /m n } is a submartingale when conditioned on y with bounded expectation (given 
y), it converges a.s. I 

To prove Theorem A, recall the following elementary result, whose proof we include for the 
sake of completeness: 

PROPOSITION 3.2. Either W = a.s. or W > a.s. on nonextinction. Ln other words, 
P[W = 0]e{q,l}. 

Proof. Let f(s) := E[s L ] be the probability generating function of L. The roots of f(s) = s in 
[0, 1] are {q, 1}. Thus, it suffices to show that P[W = 0] is such a root. Now the ith individual of 
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the first generation has a descendant Galton- Watson tree with, therefore, a martingale limit, 
say. These are independent and have the same distribution as W. Furthermore, 




or, what counts for our purposes, 

W = 4=^ Vt < Z l W {i) = . 



Conditioning on Z\ now gives immediately the desired fact that /(P[VF = 0]) = P[W = 0]. I 

Proof of Theorem A. Rewrite (2.2) as follows. Let T n be the <r-field generated by the first n levels 
of trees and GW„, GW„ be the restrictions of GW, GW to T n . Then (2.2) is the same as 

JGW 

" %Vn (t) = W n (t). (3.1) 



dGW. 



n 



It is convenient now to interpret the last expression for infinite trees t, where both sides depend 
only on the first n levels of t. In order to define W for every infinite tree t, set 



W(t) := limsupl^ n (t) . 

n— >oo 

From (3.1) follows the key dichotomy: 

W = GW-a.s. GW GW W = oo GW-a.s. (3.2) 



while 

J WdGW= 1 <=^ GW« GW W < oo GW-a.s. (3.3) 

(see Durrett (1991), p. 210, Exercise 3.6). This is the key because it allows us to change the problem 
from one about the GW-behavior of W to one about the GW-behavior of W. Indeed, since the 
GW-behavior of W is described by Theorem 3.1, the theorem is immediate: if E[Llog + L] < oo, 
i.e., E[log + L] < oo, then W < oo GW-a.s. by Theorem 3.1, whence J W dGW = 1 by (3.3); 
while if E[Llog + L] = oo, then W = oo GW-a.s. by Theorem 3.1, whence W = GW-a.s. by 
(3.2). I 
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§4. Critical Processes: Proof of Theorem C. 



LEMMA 4.1. Consider a critical Galton-Watson process with a random number Y > 1 of initial 
particles in generation 0. Choose one of the initial particles, v, at random. Let B n be the event 
that at least one of the particles to the left of v has a descendant in generation n and let r n be 
the number of descendants in generation n of the particles to the right of v. Let (5 n '■= P[B n ] and 
a n := E[r n l B J. Then lim^oo [3 n = and, ifE[Y] < oo, then lim n ^oo a n = 0. 

Proof. The fact that (5 n — > follows from writing P[_B n ] = E[P[£> ra | Y,v]] and applying the 
bounded convergence theorem. Now 

a n = B[B[r n l Bn \ Y,v}] = E[E[r n | Y,v]P[B n \ Y,v}] < V[YP[B n \ Y,v\] = E[Y1 B J 

by independence of r n and B n given Y and v. Thus, E[Y] < oo implies that a n — > 0. I 

Proof of Theorem C (i). Let A n be the event that v n is the leftmost vertex in generation n. By 
definition, GW*(yl ra | Z n ) = 1/Z n . From this, it follows that conditioning on A n reverses the 
effect of size-biasing. That is, the law of the first n generations of a tree under (GW„ | A n ) is the 
same as under (GW | Z n > 0). In particular, 



/ 



Z n d(GW, | A n ) = J Z n d(GW \Z n >0) 



1 



GW(Z n > 0) ' 



We are thus required to show that 



- / Z n d(GW* \A n )~,^-. (4.1) 



n J 2 
For any tree with a distinguished line of descendants vo,Vi,..., decompose the size of the nth 
generation by writing Z n = 1 + Y^j=i ^n,ji where Z n ^ is the number of vertices at generation 
n descended from Vj-i but not from Vj. The intuition behind (4.1) is that the unconditional 
GW*-expectation of Z n j is E[L] — 1 = a 2 ; half of these fall to the left of v n and half to the right. 
Since the chance that any given vertex at generation n — k other than v n -k has no descendant in 
generation n tends to 1 as k — > oo, conditioning on none surviving to the left leaves us with a 2 /2. 

To prove this, define R n j to be the number of vertices in generation n descended from those 
children of Vj-i to the right of Vj and R n := 1 + Y^j=i Rn,ji the number of vertices in generation 
n to the right of v n , inclusive. Let A n j be the event that R n j = Z n j. Let R' n j be independent 
random variables with respect to a probability measure Q' such that R' ■ has the (GW* | A n j)- 
distribution of R n ,j- Let Q := GW* x Q'. Define 

R n,j '■= R n,j^A n , 3 + R'njl^Anj , 

where -> denotes complement. Then the random variable i?* := 1 + Y^j=i R nj nas the same 
distribution as the (GW* | ^4 n )-law of Z n since the event A n = {R n = Z n } is the intersection of 
the independent events A n j . Also, 

J R nJ d(GW* | A nJ ) < J Z n j d(GW* | A nJ ) < J Z n j dGW* = E[L] - 1 = a 2 , 
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where the second inequality is due to Z n ^ and the indicator of A n j being negatively correlated. 
Now, for each j, we apply Lemma 4.1 with Y = Lj to the descendant trees of the children of fj-i, 
with -<A n j playing the role of B n _j. We conclude that if a < oo, then 

n 

y] \ R n,j - R*n,j\dQ 
.3 = 1 

(4.2) 



t „ 1 n 

2 / (Rn,j + K,j) dQ<~ + <T 2 (3n-j) 



as tt. — > oo. In particular, since / R n j dGW* = a 2 /2 and hence / i? n /n dGW, = a 2 /2, we get 
/ R^/ndQ — > <t 2 /2. The case <r = oo follows from this by truncating while leaving unchanged 
the rest of the size-biased tree. This shows (4.1), as desired. I 

The following simple characterization of the exponential distributions is used to prove part 
(ii) of the theorem. 

Lemma 4.2. (Pares and Khattree (1992)) Let X be a nonnegative random variable with 
a positive finite mean and let X have the corresponding size-biased distribution. Denote by U a 
uniform random variable in [0,1] which is independent of X. Then U ■ X and X have the same 
distribution iff X is exponential. 

Proof. By linearity, we may assume that ELY] = 1. For any A > 0, we have 

If 1 11 

/ Xe~ Xu - x du = -E [1 - 
.Jo J A 



E 



e -xu-x 



E 



e~ xx ] 



which equals E[e~ xx } iff E[e" AX ] = 1/(A + 1). By uniqueness of the Laplace transform, this holds 
for every A > iff X is exponential with mean 1. I 

The following lemma is elementary. 

LEMMA 4.3. Suppose that X , X n are nonnegative random variables with positive finite means 
such that X n — » X in law and X n — ► Y in law. If Y is a proper random variable, then Y has the 
law of X. 

Proof of Theorem C (ii). Suppose first that a < oo. This ensures that the GW-laws of Z n /n 
have uniformly bounded means and, a fortiori, are tight. Let R n and i?* be as in the proof of part 
(i). Then i2*/n also have uniformly bounded means and hence are tight. Therefore, there is a 
sequence {n^} tending to infinity such that R* nk jn^ converges in law to a (proper) random variable 
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X and the GW-laws of Z nk /rik converge to the law of a (proper) random variable Y. Note that 
the law of -R* fc is the (GW | Z n > 0)-law of Z nk . Thus, from Lemma 4.3 combined with (4.1), the 
variables Y and X are identically distributed. Also, by (4.2), the GW*-laws of R nk /nk tend to 
the law of X. 

On the other hand, let U be a uniform [0, l]-valued random variable independent of every 
other random variable encountered so far. Then R n and \U ■ Z n ~\ have the same law (with respect 
to GW*), while 



-\U-Z n ] --U-Z„ 
n n 



<I~o. 

n 



Hence X and U ■ X have the same distribution. It follows from Lemma 4.2 and (4.1) that X is an 
exponential random variable with mean a 2 /2. In particular, the limiting distribution of i?* fc /rifc 
is independent of the sequence n^, and hence we actually have convergence in law of the whole 
sequence R* n jn to X, as desired. 

Now suppose that a = oo. A truncation argument shows that the GW-laws of Z n /n tend to 
infinity, whence so do the laws of \U ■ Z n ~\/n. Thus, the (GW | Z n > 0)-laws of Z n /n tend to 
infinity as well. I 

Remark. The fact that the limit GW-law of Z n /n is that of X, i.e., the sum of two independent 
exponentials with mean a 2 / 2 each, is due to Harris (see Athreya and Ney (1972), pp. 59-60). The 
above proof allows us to identify these two exponentials as normalized counts of the vertices to the 
left and right of the "spine" (v ,vi, . . .). 
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§5. Subcritical Processes: Proof of Theorem B. 



Let n n be the law of Z n conditioned on Z n > 0. For any tree t, let £„(t) be the leftmost vertex 
in the first generation having at least one descendant in generation n if Z n > 0. Let H n {t) be the 
number of descendants of £ n (t) in generation n, or zero if Z n = 0. It is easy to see that 

GW(H n = k | Z n > 0) = GW(H n = k | Z n > 0, £„ = x) = GW(2 n _i = k \ Z n _ x > 0) 

for all children x of the root. Since H n < Z n , this shows that {^ n } increases stochastically in n. 
Now 

GW(Z , Q) E ^ 

[n J B[Z n \Z n >0] fxd» n (x)- 

Therefore, GW(Z n > 0)/m n is decreasing and (i) -£4> (ii). The equivalence of (ii) and (hi) is an 
immediate consequence of the following routine lemma applied to the laws /j, n and the following 
theorem on immigration. I 

LEMMA 5.1. Let {u n } be a sequence of probability measures on the positive integers with finite 
means a n . Let v n be size-biased, i.e., v n {k) = ku n (k) / a n . If {v n } is tight, then supa„ < oo, while 
ifv n — > co in distribution, then a n — > oo. 

Theorem 5.2. (Heathcote (1966)) Let Z n be the generation sizes of a Galton- Watson process 
with offspring random variable L and immigration Y n . Suppose that m := E[L] < 1 and let Y 
have the same law as Y n . 7/E[log + Y] < oo, then Z n converges in distribution to a proper random 
variable, while i/E[log + Y\ = oo, then Z n converges in probability to infinity. 

The following proof is a slight improvement on Asmussen and Hering (1983), pp. 52-53. 

Proof. Let y be the cr-field generated by ; k > 1}. For any n, let Z n ^ be the number of 
descendants at level n of the vertices which immigrated in generation k. Thus, the total number 
of vertices at level n is Yl2=i Z n ,k- Since the distribution of Z n ^ depends only on n — k, this total 
Z n has the same distribution as X^fc=i ^2k,k, which is an increasing process with limit Z'^. By 
Kolmogorov's zero-one law, Z'^ is a.s. hnite or a.s. infinite. Hence, we need only to show that 
< oo iff E[log + Y] < oo. 

Assume that E[log + y] < oo. Now E[Z^ | y] = Y,^ =l Y k m k - 1 . Since {Yfc} is almost surely 
subexponential in k by Lemma 1.1, this sum converges a.s. Therefore, Z'^ is finite a.s. 

Now assume that Z'^ < oo a.s. Writing Z 2 k,k = YlJ=i(k(i), where Cfc(*) are the sizes of 
generation k — 1 of i.i.d. Galton- Watson branching processes with one initial particle, we have 
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Z'oo = J2T=i Yli=i Cfc(*) written as a random sum of independent random variables. Only a finite 
number of them are at least one, whence by the Borel-Cantelli lemma conditioned on y, we get 
Y^ = i nGW(Z fc _! > 1) < oo a.s. Since GW(Zn > 1) > P[L > 0] fc "\ it follows by Lemma 1.1 
that E[log + Y] < oo. I 

§6. Strong Convergence of the Conditioned Process in the Subcritical Case. 

Yaglom (1947) showed that when m < 1 and Z\ has a finite second moment, the conditional 
distribution /i n of Z n given {Z n > 0} converges to a proper probability distribution as n — > oo. 
This was proved without the second moment assumption by Joffe (1967) and by Heathcote, Seneta 
and Vere-Jones (1967). The following stronger convergence result was proved, in an equivalent 
form, by Williamson (cf. Athreya-Ney (1972), pp. 64-65.) 

Theorem 6.1. The sequence {/i n } always converges in a strong sense: if \\ ■ \\ denotes total 
variation norm, then J2 n \\/j, n — l~i n -i\\ < °o. 

Remark. Note that this is strictly stronger than weak convergence to a probability measure, even 
for a stochastically increasing sequence of distributions. 

Proof of Theorem 6.1. Recalling the notation of the previous section and the events A n j from 
Section 4, we see that 

1 1 /in - A*n-1 II < GW(H n ^ Z n | Z n > 0) = GW*(H n + Z n | A n ) = GW t (ff„ Z n | A nA ) . 

Let A be the number of children of the root to the left of v\ and let s n = GW(Z n > 0). Now 
condition on L\ and A and use the fact that inf n > 2 GW,(Ai,i) =: 5 > to estimate 

GW,{H n ^Z n \A n ^)<5- 1 GW,{{H n ^Z n }^A^ 1 ) 

oo k — 1 

= 5_1 E E ^W* i = k,X = l,H n ^ Z n ,A nA ) 

k=l 1=0 
oo ; k — 1 1 

= 5-^ — 11 r(i - - (i - ^- 1 ) fc " 1 -'] • 

fc=l 2=0 

Sum this in n by breaking it into two pieces: those n for which s~_ x < and those for which 
> fc. For the first piece, use Yli=o(^ ~ s n-i) 1 < s n-i' an d for the second piece, use 
fc-i fc-i 

- (1 - Sh-i)*" 1 "'] < J2( k - 1 - 0«n-l < fcV-l/2 . 
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These estimates yield 



S n-1+ E ^"-l/ 2 • 
s~^ 1 <k s„_i<l/fe 

By virtue of Theorem B, we have Sj < msj-i, so that each of these two inner sums is bounded by 
a geometric series, whence the total is finite. I 

§7. Stationary Random Environments. 

The analogue of the Kesten-Stigum theorem for branching processes in random environments 
(BPRE's) is due to Tanny (1988). Our method of proof applies to this situation too. Further- 
more, the probabilistic construction of size-biased trees makes apparent how to remove a technical 
hypothesis in Tanny's extension of the Kesten-Stigum theorem. The analytic tool needed for this 
turns out to be an ergodic lemma due to Tanny (1974). 

In this set-up, the fixed "environment" / is replaced by a stationary ergodic sequence f n of 
random environments. Vertices in generation n — 1 have offspring according to the law with p.g.f. 
/„. Assume that the process is supercritical with finite growth, i.e., < E[log/((l)] < oo. Write 
M n := 11^=1 /fe(l); this is the conditional mean of Z n given the environment sequence f := (fk)- 
The quotients Z n /M n still form a martingale with a.s. limit W. We shall also use our notation 
that L n are random variables which, given f , are independent and have size-biased distributions, 
so that f' n {s) = fn(l) Ylk>i P[-^n = k]s k ~ 1 . Tanny's (1988) theorem with the technical hypothesis 
removed is as follows. 

Theorem 7.1. If for some a > 0, the sum ^2 n ^P\L n > a n | f ] is finite a.s., then E[W] = 1 and 
a.s. on nonextinction, while if this sum is infinite with positive probability for some a > 0, 
then W = a.s. In case the environments /„ are i.i.d., the a.s. finiteness of this sum for some a 
is equivalent to the finiteness o/E[log + Li]. 

In the proof of Theorem 7.1, for the case of i.i.d. environments, Lemma 1.1 is used in the same 
way as it is for a fixed environment. However, the general case requires the following extension of 
Lemma 1.1. 

12 



OO 

E\\H„ - Hn-l\\ < 25" 1 Y] — 
n=l k 



Lemma 7.2. (Tanny (1974)) Let r be an ergodic measure-preserving transformation of a proba- 
bility space (X, /j,), and let f be a nonnegative measurable function on X. Then limsup /(r n (x))/n 
is either a.s. or oo a.s. 

The following argument, indicated to us by Jack Feldman, is considerably shorter than the 
proofs in Tanny (1974) and Wos (1987). It is similar to but somewhat shorter than the proof in 
O'Brien (1982). 

Proof of Lemma 7.2. By ergodicity, limsup /(r n (x))/n is a.s. a constant c < oo. Suppose that 
< c < oo. Then we may choose L so that A := {x ; V/c > L f(r k (x))/k < 2c} has n(A) > j^. 
The ergodic theorem applied to 1a implies that for a.e. x, if n is sufficiently large, then there exists 
k G [L, n/5] such that y := T n ~ k (x) 6 A, and therefore 

/(r"(x)) _ f(r k (y)) k ^ 2c 
n k n 5 

This contradicts the definition of c. I 

The role of Lemma 7.2 in the proof of Theorem 7.1 is not large: By Lemma 7.2, we know that 
limsup(l/n) log + L n is a.s. or a.s. infinite. Since the random variables L n are independent given 
the environment f , the Borel-Cantelli lemma still shows that which of these alternatives holds is 
determined by whether the sum ^ n P[L n > a n \ f ] is finite for every a > or not. In particular, 
this sum is finite with positive probability for some a if and only if it is finite a.s. for every a. 

The proof that if limsup(l/n) log + L n = oo a.s., then W = a.s. applies without change to 
the case of random environments. The proof in the other direction needs only conditioning on f 
in addition to conditioning on the cr-field y. 

The fact that if E[W] > 0, then W ^ a.s. on nonextinction for BPRE's can be proved 
by virtually the same method as that of Proposition 3.2, using the uniqueness of the conditional 
extinction probability (Athreya and Karlin 1971). 

Acknowledgement: We are grateful to Jack Feldman for permitting the inclusion of his proof of 
Lemma 7.2 in this paper. 
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